feat: Feather (Arrow IPC) positional point-lookup sidecar#28
Conversation
FeatherLookupProvider + FeatherSidecarBuilder behind a feather-provider feature: a sparse-rowid positional sidecar for DuckLake vector search, served through the existing PointLookupProvider trait. Hydration is binary-search (partition_point) over the sorted rowid column plus an exact-match guard, then take(positions) on verbatim Arrow columns - no SQLite/serde_json dep and no Arrow<->SQL type mapping. The builder sorts at finish, so correctness does not depend on the DuckLake scan order. Tests cross-check row/key/order/projection parity against the SQLite provider on the supported types, and verify lossless round-trip of types SQLite rejects (Decimal/FixedSizeList/Struct/Dictionary/nested List).
finish() sorted rows via sort_to_indices on the raw key column (native, i.e. signed, order for Int64/Int32), while the open-time verification and fetch_by_keys binary-search compare keys cast as u64. The two orderings diverge for keys whose i64 and u64 order differ (e.g. negative Int64), so a genuinely-ascending build was rejected by from_batches as 'corrupt'. Argsort the u64-cast keys instead, so the on-disk order, the open-time sortedness check, and the search all use one domain for every supported key type. Tighten the sortedness guard to strictly-ascending (reject duplicate keys, matching SQLite's INTEGER PRIMARY KEY) and factor the null-checked u64 key extraction into a shared helper. Drop stray PoC framing from the module docs.
Add tests for the open() rejection paths (unsorted / duplicate keys, malformed and missing files), the TableProvider::scan round-trip, UInt64 key columns, and a regression test that negative-but-ascending Int64 keys build and round-trip through the u64 lookup domain.
| let col_indices: Vec<usize> = match projection { | ||
| None => (0..self.schema.fields().len()).collect(), | ||
| Some(idxs) => idxs.to_vec(), | ||
| }; | ||
| let out_schema: SchemaRef = match projection { | ||
| None => self.schema.clone(), | ||
| Some(idxs) => Arc::new(arrow_schema::Schema::new( | ||
| idxs.iter() | ||
| .map(|&i| self.schema.field(i).clone()) | ||
| .collect::<Vec<_>>(), | ||
| )), | ||
| }; | ||
|
|
||
| let pos_arr = UInt64Array::from(positions); | ||
| let cols: Vec<ArrayRef> = col_indices | ||
| .iter() | ||
| .map(|&i| { | ||
| compute::take(self.batch.column(i).as_ref(), &pos_arr, None) | ||
| .map_err(|e| DataFusionError::ArrowError(Box::new(e), None)) | ||
| }) | ||
| .collect::<DFResult<_>>()?; |
There was a problem hiding this comment.
super nit: an out-of-range projection index panics here (self.schema.field(i) and self.batch.column(i) index-out-of-bounds), whereas the sibling SqliteLookupProvider/HashKeyProvider use RecordBatch::project, which returns a DataFusionError instead. Projections are planner-generated and always valid in normal operation, so this isn't reachable today — but matching the siblings' graceful-error behavior would be more defensive. (not blocking)
There was a problem hiding this comment.
Reviewed the Feather/Arrow IPC point-lookup provider. The u64-domain consistency fix (build sort / open-time check / search all in one domain) is sound and well-justified, and test coverage is strong — SQLite parity, type fidelity, sort-agnostic build, open() rejection paths, and key-type variants. One non-blocking super nit left inline. LGTM.
Note: I was unable to run cargo test/clippy in the review environment (command approval-gated), so this approval is based on code inspection; please ensure the pre-push checklist passes.
Make FeatherLookupProvider memory-comparable to the SQLite sidecar (which pages its B-tree) instead of fully resident: - READ: mmap the file and decode Arrow IPC zero-copy (FileDecoder over a Buffer backed by the mapping). Column buffers point into the mapped file, so a scattered take() faults in only the touched pages; only the K-row output is heap-allocated. A first-key-per-batch coarse index narrows each lookup to one (now multi-batch) file segment before the in-batch binary search. - BUILD: bounded-memory external sort. push_batch spills rows UNSORTED to a temp IPC file (O(one batch) payload resident), keeping only an O(N)*8B key index in memory; finish argsorts the keys and gathers rows in sorted order from the mmap'd temp via interleave, in bounded chunks, into the final sorted file. No full-payload buffering and no dependency on the input scan's row order (the DuckLake rowid scan order is not reliable). Also: reject out-of-range projection indices with a DataFusionError instead of panicking (matches the sibling providers; addresses the PR #28 nit), and reject duplicate keys at build/open. Adds a multi-batch shuffled-build parity test exercising the coarse index + cross-batch gather. New deps: memmap2, tempfile.
Address the mmap/spill-rewrite review: - open(): guard the footer length against the file size and range-check every IPC block offset/length before slicing, so a corrupt/foreign file returns a typed error instead of panicking (no-panic-on-open contract). Must-fix. - finish(): publish via write-to-sibling-temp + atomic rename instead of truncating the final path in place, so a concurrent reader's live mmap never observes a truncated inode (SIGBUS/torn reads); also crash-safe. - finish(): use usize sort indices (drops the >4.29B-row u32 truncation). - Docs: correct the build-memory wording (u64 keys + usize permutation, not payload), the read-path heap-vs-page-in nuance, and note the single-dictionary constraint for Dictionary payload columns. - Tests: corrupt-footer rejection, empty/single-row builds, exact chunk-boundary gather, and cross-push duplicate-key rejection.
There was a problem hiding this comment.
Reviewed the mmap/IPC decode path, the sorted-key lookup, the spill-sort build, and the test suite. Length/offset guards, u64 ordering-domain consistency, key/row alignment, and the atomic publish all check out. The prior projection-panic super-nit is addressed. Tests are thorough and meaningful. LGTM.
Adds
FeatherLookupProvider+FeatherSidecarBuilderbehind a newfeather-providerfeature — an Arrow/Feather (Arrow IPC) positional payload sidecar served through the existingPointLookupProvidertrait, as a sibling to the SQLite and Parquet providers. Targets the DuckLake vector-search path (sparse, holeyrowids).What it does
rowid.FileDecoderover aBufferbacked by the mapping). A first-key-per-batch coarse index narrows a lookup to one batch;partition_pointbinary-searches that batch's rowid column; an exact-match guard rejects a missing rowid;takepulls the matched rows. A scatteredtakefaults in only the touched pages, so resident heap is bounded by the working set — memory-comparable to SQLite paging its B-tree (no full-file load).push_batchstreams rows UNSORTED to a temp IPC file (O(one batch) payload resident), keeping only an O(N)-key index in memory;finishargsorts the keys and gathers rows in sorted order from the mmap'd temp viainterleave, in bounded chunks, into a temp file that is atomically renamed into place. No full-payload buffering and no dependency on the input scan's row order (DuckLakerowidscan order is not reliable).Why
rusqlite/bundled-libsqlite3 +serde_jsontax for this path.Tests
SQLite-parity on supported types (sparse-rowid key sets incl. edges, duplicates, absent keys, empty; single- and multi-batch with cross-boundary keys), lossless round-trip of the SQLite-rejected types, projection (graceful out-of-range error), sort-agnostic/shuffled build,
open()rejection paths (unsorted/duplicate/malformed/missing/corrupt-footer),TableProvider::scan, multiple key types, and empty/single-row/exact-chunk-boundary builds. Also verified end-to-end on a live server against a real DuckLake snapshot.Known limitation (for review)
Dictionarypayload columns whose dictionary differs across input batches are rejected at build time (the IPC writer requires a consistent dictionary across the spilled batches) rather than silently corrupted — documented in the module header. Single-dictionary (the common case) round-trips fine.Scope / follow-ups
Compress-at-rest + expand-on-NVMe for storage/egress; mount demand-paging / partial S3-mount fetch; removing the
sqlite-providerfeature/rusqlitedep once the parquet backend (its only remaining user) is removed.